Skip to content

chore(B5e-v2): plan to close memory gap with Luerl#238

Closed
davydog187 wants to merge 1 commit into
mainfrom
plans/B5e-mutable-regs
Closed

chore(B5e-v2): plan to close memory gap with Luerl#238
davydog187 wants to merge 1 commit into
mainfrom
plans/B5e-mutable-regs

Conversation

@davydog187
Copy link
Copy Markdown
Contributor

B5e-v2: Dispatcher mutable register storage

Plan: .agents/plans/B5e-v2-dispatcher-mutable-regs.md

Context

B5a-v2 (PR #237) closed most of the memory gap with Luerl on fib(25):

Path Time Memory
Dispatcher 51.6 ms 263 MB
Luerl 64.5 ms 227 MB
Interpreter 73.7 ms 673 MB

We're 1.25x faster than Luerl on time but still 1.16x heavier on memory. The deficit traces directly to :erlang.setelement/3 copying the register tuple on every register write — ~80% of dispatcher allocations.

Proposed approach

Replace the dispatcher's immutable tuple register file with process-dict-backed mutable storage. Scope is dispatcher-only — the interpreter and the bytecode format are unchanged. Each setelement becomes Process.put({:disp_reg, idx}, value). Frame save/restore allocates one tuple per call (down from N tuples per N register writes).

Projected savings on fib(22):

  • Today: ~7.2M words allocated for register writes
  • After: ~0.6M words (only per-call frame snapshots)
  • ~92% reduction in register-file allocation

Gates

  • Soft: dispatcher allocation on fib(25) ≤ 130 MB (50% reduction from 263 MB)
  • Stretch: dispatcher allocation ≤ 227 MB (Luerl parity)
  • Time: must hold ≥1.4x speedup vs interpreter (currently 1.43x); ideally ≥1.5x

Key risks (documented in the plan)

  1. Process dict put/get may be slower than tuple setelement on hot paths. First measurement could show time regression. If time drops below 1.3x vs interpreter, abandon and accept the memory deficit.
  2. Reentry safety — nested Lua.eval! calls via :native_func callbacks must snapshot/restore process-dict keys cleanly.
  3. Short-running workloads may slightly regress on memory (per-call snapshot now costs more than the old per-write setelement). Break-even ~5-10 opcodes per call.

Out of scope

  • Interpreter register file (stays as tuples)
  • NIF-backed mutable storage
  • ETS-backed registers
  • :array module
  • Compile-time register lifetime analysis
  • Register sharing between caller and callee (Luerl-style)
  • Open upvalue opcodes in dispatcher (still fallback)

Status

blocked on B5a-v2 (PR #237) landing. Once merged, this plan is ready to ship.

Verification

mix test
mix test --only lua53
LUA_BENCH_MODE=full MIX_ENV=benchmark mix run benchmarks/dispatcher_vs_interpreter.exs
MIX_ENV=benchmark mix profile.tprof --type memory -e '...'

Captures the follow-up identified during B5a-v2 review: dispatcher
allocates 263 MB on fib(25) vs Luerl's 227 MB, with ~80% of the
deficit attributable to :erlang.setelement/3 copying the register
tuple on every opcode.

Approach: replace immutable tuple regs with process-dict-backed
mutable storage scoped to the dispatcher only. The interpreter and
the bytecode format stay unchanged. Targets ≤130 MB on fib(25)
(soft gate) and Luerl parity (stretch).

Blocked on B5a-v2 (PR #237) landing.
@davydog187
Copy link
Copy Markdown
Contributor Author

Withdrawing this plan after empirical investigation. The 'memory gap with Luerl' (263 MB vs 227 MB allocated on fib(25)) is cumulative bytes allocated, not retained heap. Peak resident memory on fib(28) is 136 KB — the BEAM is already destructively updating the register tuple in our regs = :erlang.setelement(...) ; dispatch(...) tail-call pattern. Replacement plan needed only after measuring whether GC pause time is actually a problem; will draft from real evidence rather than from a Benchee allocation counter.

@davydog187 davydog187 closed this May 23, 2026
@davydog187 davydog187 deleted the plans/B5e-mutable-regs branch May 23, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant